Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.755
Filtrar
1.
Sci Rep ; 14(1): 8165, 2024 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589653

RESUMO

Accurately calling indels with next-generation sequencing (NGS) data is critical for clinical application. The precisionFDA team collaborated with the U.S. Food and Drug Administration's (FDA's) National Center for Toxicological Research (NCTR) and successfully completed the NCTR Indel Calling from Oncopanel Sequencing Data Challenge, to evaluate the performance of indel calling pipelines. Top performers were selected based on precision, recall, and F1-score. The performance of many other pipelines was close to the top performers, which produced a top cluster of performers. The performance was significantly higher in high confidence regions and coding regions, and significantly lower in low complexity regions. Oncopanel capture and other issues may have occurred that affected the recall rate. Indels with higher variant allele frequency (VAF) may generally be called with higher confidence. Many of the indel calling pipelines had good performance. Some of them performed generally well across all three oncopanels, while others were better for a specific oncopanel. The performance of indel calling can further be improved by restricting the calls within high confidence intervals (HCIs) and coding regions, and by excluding low complexity regions (LCR) regions. Certain VAF cut-offs could be applied according to the applications.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Polimorfismo de Nucleotídeo Único
2.
Anim Biotechnol ; 35(1): 2337751, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38597900

RESUMO

The economic efficiency of sheep breeding, aiming to enhance productivity, is a focal point for improvement of sheep breeding. Recent studies highlight the involvement of the Early Region 2 Binding Factor transcription factor 8 (E2F8) gene in female reproduction. Our group's recent genome-wide association study (GWAS) emphasizes the potential impact of the E2F8 gene on prolificacy traits in Australian White sheep (AUW). Herein, the purpose of this study was to assess the correlation of the E2F8 gene with litter size in AUW sheep breed. This work encompassed 659 AUW sheep, subject to genotyping through PCR-based genotyping technology. Furthermore, the results of PCR-based genotyping showed significant associations between the P1-del-32bp bp InDel and the fourth and fifth parities litter size in AUW sheep; the litter size of those with genotype ID were superior compared to those with DD and II genotypes. Thus, these results indicate that the P1-del-32bp InDel within the E2F8 gene can be useful in marker-assisted selection (MAS) in sheep.


Assuntos
Estudo de Associação Genômica Ampla , Mutação INDEL , Feminino , Animais , Ovinos/genética , Gravidez , Austrália , Tamanho da Ninhada de Vivíparos/genética , Genótipo , Mutação INDEL/genética
3.
BMC Genomics ; 25(1): 329, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38566035

RESUMO

BACKGROUND: Previously, a novel multiplex system of 64 loci was constructed based on capillary electrophoresis platform, including 59 autosomal insertion/deletions (A-InDels), two Y-chromosome InDels, two mini short tandem repeats (miniSTRs), and an Amelogenin gene. The aim of this study is to evaluate the efficiencies of this multiplex system for individual identification, paternity testing and biogeographic ancestry inference in Chinese Hezhou Han (CHH) and Hubei Tujia (CTH) groups, providing valuable insights for forensic anthropology and population genetics research. RESULTS: The cumulative values of power of discrimination (CDP) and probability of exclusion (CPE) for the 59 A-InDels and two miniSTRs were 0.99999999999999999999999999754, 0.99999905; and 0.99999999999999999999999999998, 0.99999898 in CTH and CHH groups, respectively. When the likelihood ratio thresholds were set to 1 or 10, more than 95% of the full sibling pairs could be identified from unrelated individual pairs, and the false positive rates were less than 1.2% in both CTH and CHH groups. Biogeographic ancestry inference models based on 35 populations were constructed with three algorithms: random forest, adaptive boosting and extreme gradient boosting, and then 10-fold cross-validation analyses were applied to test these three models with the average accuracies of 86.59%, 84.22% and 87.80%, respectively. In addition, we also investigated the genetic relationships between the two studied groups with 33 reference populations using population statistical methods of FST, DA, phylogenetic tree, PCA, STRUCTURE and TreeMix analyses. The present results showed that compared to other continental populations, the CTH and CHH groups had closer genetic affinities to East Asian populations. CONCLUSIONS: This novel multiplex system has high CDP and CPE in CTH and CHH groups, which can be used as a powerful tool for individual identification and paternity testing. According to various genetic analysis methods, the genetic structures of CTH and CHH groups are relatively similar to the reference East Asian populations.


Assuntos
Genética Populacional , Irmãos , Humanos , Filogenia , China , Mutação INDEL , Repetições de Microssatélites , Genética Forense/métodos , Frequência do Gene
4.
Genome Biol ; 25(1): 101, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38641647

RESUMO

Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor's variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.


Assuntos
Genoma , Genômica , Genômica/métodos , Biologia Computacional , Mutação INDEL , Viés , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos
5.
Cell ; 187(8): 1955-1970.e23, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38503282

RESUMO

Characterizing somatic mutations in the brain is important for disentangling the complex mechanisms of aging, yet little is known about mutational patterns in different brain cell types. Here, we performed whole-genome sequencing (WGS) of 86 single oligodendrocytes, 20 mixed glia, and 56 single neurons from neurotypical individuals spanning 0.4-104 years of age and identified >92,000 somatic single-nucleotide variants (sSNVs) and small insertions/deletions (indels). Although both cell types accumulate somatic mutations linearly with age, oligodendrocytes accumulated sSNVs 81% faster than neurons and indels 28% slower than neurons. Correlation of mutations with single-nucleus RNA profiles and chromatin accessibility from the same brains revealed that oligodendrocyte mutations are enriched in inactive genomic regions and are distributed across the genome similarly to mutations in brain cancers. In contrast, neuronal mutations are enriched in open, transcriptionally active chromatin. These stark differences suggest an assortment of active mutagenic processes in oligodendrocytes and neurons.


Assuntos
Envelhecimento , Encéfalo , Neurônios , Oligodendroglia , Humanos , Envelhecimento/genética , Envelhecimento/patologia , Cromatina/genética , Cromatina/metabolismo , Mutação , Neurônios/metabolismo , Neurônios/patologia , Oligodendroglia/metabolismo , Oligodendroglia/patologia , Análise da Expressão Gênica de Célula Única , Sequenciamento Completo do Genoma , Encéfalo/metabolismo , Encéfalo/patologia , Polimorfismo de Nucleotídeo Único , Mutação INDEL , Bancos de Espécimes Biológicos , Células Precursoras de Oligodendrócitos/metabolismo , Células Precursoras de Oligodendrócitos/patologia
6.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38426352

RESUMO

MOTIVATION: Intra-host variants refer to genetic variations or mutations that occur within an individual host organism. These variants are typically studied in the context of viruses, bacteria, or other pathogens to understand the evolution of pathogens. Moreover, intra-host variants are also explored in the field of tumor biology and mitochondrial biology to characterize somatic mutations and inherited heteroplasmic mutations. Intra-host variants can involve long insertions, deletions, and combinations of different mutation types, which poses challenges in their identification. The performance of current methods in detecting of complex intra-host variants is unknown. RESULTS: First, we simulated a dataset comprising 10 samples with 1869 intra-host variants involving various mutation patterns and benchmarked current variant detection software. The results indicated that though current software can detect most variants with F1-scores between 0.76 and 0.97, their performance in detecting long indels and low frequency variants was limited. Thus, we developed a new software, PySNV, for the detection of complex intra-host variations. On the simulated dataset, PySNV successfully detected 1863 variant cases (F1-score: 0.99) and exhibited the highest Pearson correlation coefficient (PCC: 0.99) to the ground truth in predicting variant frequencies. The results demonstrated that PySNV delivered promising performance even for long indels and low frequency variants, while maintaining computational speed comparable to other methods. Finally, we tested its performance on SARS-CoV-2 replicate sequencing data and found that it reported 21% more variants compared to LoFreq, the best-performing benchmarked software, while showing higher consistency (62% over 54%) within replicates. The discrepancies mostly exist in low-depth regions and low frequency variants. AVAILABILITY AND IMPLEMENTATION: https://github.com/bnuLyndon/PySNV/.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Mutação INDEL , Variação Genética
7.
Sci Rep ; 14(1): 7028, 2024 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-38528062

RESUMO

Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.


Assuntos
Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Biologia Computacional , Controle de Qualidade , Mutação INDEL , Polimorfismo de Nucleotídeo Único
8.
Gene ; 908: 148246, 2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38325665

RESUMO

Changes in the nervous system are related to a wide range of mental disorders, which include neurodevelopmental disorders (NDD) that are characterized by early onset mental conditions, such as schizophrenia and autism spectrum disorders and correlated conditions (ASD). Previous studies have shown distinct genetic components associated with diverse schizophrenia and ASD phenotypes, with mostly focused on rescuing neural phenotypes and brain activity, but alterations related to vision are overlooked. Thus, as the vision is composed by the eyes that itself represents a part of the brain, with the retina being formed by neurons and cells originating from the glia, genetic variations affecting the brain can also affect the vision. Here, we performed a critical systematic literature review to screen for all genetic variations in individuals presenting NDD with reported alterations in vision. Using these restricting criteria, we found 20 genes with distinct types of genetic variations, inherited or de novo, that includes SNP, SNV, deletion, insertion, duplication or indel. The variations occurring within protein coding regions have different impact on protein formation, such as missense, nonsense or frameshift. Moreover, a molecular analysis of the 20 genes found revealed that 17 shared a common protein-protein or genetic interaction network. Moreover, gene expression analysis in samples from the brain and other tissues indicates that 18 of the genes found are highly expressed in the brain and retina, indicating their potential role in adult vision phenotype. Finally, we only found 3 genes from our study described in standard public databanks of ophthalmogenetics, suggesting that the other 17 genes could be novel target for vision diseases.


Assuntos
Transtorno do Espectro Autista , Transtornos do Neurodesenvolvimento , Adulto , Humanos , Redes Reguladoras de Genes , Transtornos do Neurodesenvolvimento/genética , Transtorno do Espectro Autista/genética , Fenótipo , Mutação INDEL
9.
Nat Genet ; 56(3): 541-552, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38361034

RESUMO

Mutational signature analysis is a recent computational approach for interpreting somatic mutations in the genome. Its application to cancer data has enhanced our understanding of mutational forces driving tumorigenesis and demonstrated its potential to inform prognosis and treatment decisions. However, methodological challenges remain for discovering new signatures and assigning proper weights to existing signatures, thereby hindering broader clinical applications. Here we present Mutational Signature Calculator (MuSiCal), a rigorous analytical framework with algorithms that solve major problems in the standard workflow. Our simulation studies demonstrate that MuSiCal outperforms state-of-the-art algorithms for both signature discovery and assignment. By reanalyzing more than 2,700 cancer genomes, we provide an improved catalog of signatures and their assignments, discover nine indel signatures absent in the current catalog, resolve long-standing issues with the ambiguous 'flat' signatures and give insights into signatures with unknown etiologies. We expect MuSiCal and the improved catalog to be a step towards establishing best practices for mutational signature analysis.


Assuntos
Música , Neoplasias , Humanos , Neoplasias/genética , Mutação , Carcinogênese/genética , Mutação INDEL
10.
Nature ; 627(8004): 586-593, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38355797

RESUMO

Over half of hepatocellular carcinoma (HCC) cases diagnosed worldwide are in China1-3. However, whole-genome analysis of hepatitis B virus (HBV)-associated HCC in Chinese individuals is limited4-8, with current analyses of HCC mainly from non-HBV-enriched populations9,10. Here we initiated the Chinese Liver Cancer Atlas (CLCA) project and performed deep whole-genome sequencing (average depth, 120×) of 494 HCC tumours. We identified 6 coding and 28 non-coding previously undescribed driver candidates. Five previously undescribed mutational signatures were found, including aristolochic-acid-associated indel and doublet base signatures, and a single-base-substitution signature that we termed SBS_H8. Pentanucleotide context analysis and experimental validation confirmed that SBS_H8 was distinct to the aristolochic-acid-associated SBS22. Notably, HBV integrations could take the form of extrachromosomal circular DNA, resulting in elevated copy numbers and gene expression. Our high-depth data also enabled us to characterize subclonal clustered alterations, including chromothripsis, chromoplexy and kataegis, suggesting that these catastrophic events could also occur in late stages of hepatocarcinogenesis. Pathway analysis of all classes of alterations further linked non-coding mutations to dysregulation of liver metabolism. Finally, we performed in vitro and in vivo assays to show that fibrinogen alpha chain (FGA), determined as both a candidate coding and non-coding driver, regulates HCC progression and metastasis. Our CLCA study depicts a detailed genomic landscape and evolutionary history of HCC in Chinese individuals, providing important clinical implications.


Assuntos
Carcinoma Hepatocelular , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias Hepáticas , Mutação , Sequenciamento Completo do Genoma , Humanos , Ácidos Aristolóquicos/metabolismo , Carcinogênese , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/virologia , China , Cromotripsia , Progressão da Doença , DNA Circular/genética , População do Leste Asiático/genética , Evolução Molecular , Genoma Humano/genética , Vírus da Hepatite B/genética , Mutação INDEL/genética , Fígado/metabolismo , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/virologia , Mutação/genética , Metástase Neoplásica/genética , Fases de Leitura Aberta/genética , Reprodutibilidade dos Testes
11.
CRISPR J ; 7(1): 29-40, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38353621

RESUMO

The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system has been widely used to create animal models for biomedical and agricultural use owing to its low cost and easy handling. However, the occurrence of erroneous cleavage (off-targeting) may raise certain concerns for the practical application of the CRISPR-Cas9 system. In this study, we created a melanocortin 1 receptor (MC1R)-edited pig model through somatic cell nuclear transfer (SCNT) by using porcine kidney cells modified by the CRISPR-Cas9 system. We then carried out whole-genome sequencing of two MC1R-edited pigs and two cloned wild-type siblings, together with the donor cells, to assess the genome-wide presence of single-nucleotide variants and small insertions and deletions (indels) and found only one candidate off-target indel in both MC1R-edited pigs. In summary, our study indicates that the minimal off-targeting effect induced by CRISPR-Cas9 may not be a major concern in gene-edited pigs created by SCNT.


Assuntos
Sistemas CRISPR-Cas , Receptor Tipo 1 de Melanocortina , Animais , Suínos/genética , Receptor Tipo 1 de Melanocortina/genética , Sistemas CRISPR-Cas/genética , Edição de Genes , Mutação , Mutação INDEL/genética
12.
Cells ; 13(3)2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38334653

RESUMO

Successful genome editing depends on the cleavage efficiency of programmable nucleases (PNs) such as the CRISPR-Cas system. Various methods have been developed to assess the efficiency of PNs, most of which estimate the occurrence of indels caused by PN-induced double-strand breaks. In these methods, PN genomic target sites are amplified through PCR, and the resulting PCR products are subsequently analyzed using Sanger sequencing, high-throughput sequencing, or mismatch detection assays. Among these methods, Sanger sequencing of PCR products followed by indel analysis using online web tools has gained popularity due to its user-friendly nature. This approach estimates indel frequencies by computationally analyzing sequencing trace data. However, the accuracy of these computational tools remains uncertain. In this study, we compared the performance of four web tools, TIDE, ICE, DECODR, and SeqScreener, using artificial sequencing templates with predetermined indels. Our results demonstrated that these tools were able to estimate indel frequency with acceptable accuracy when the indels were simple and contained only a few base changes. However, the estimated values became more variable among the tools when the sequencing templates contained more complex indels or knock-in sequences. Moreover, although these tools effectively estimated the net indel sizes, their capability to deconvolute indel sequences exhibited variability with certain limitations. These findings underscore the importance of judiciously selecting and using an appropriate tool with caution, depending on the type of genome editing being performed.


Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Edição de Genes/métodos , Sistemas CRISPR-Cas/genética , Mutação INDEL/genética , Genoma/genética , Genômica
13.
J Genet ; 1032024.
Artigo em Inglês | MEDLINE | ID: mdl-38258319

RESUMO

Dissecting the molecular basis of adaptation remains elusive despite our ability to sequence genomes and transcriptomes. At present, most genomic research on selection focusses on signatures of selective sweeps in patterns of heterozygosity. Other research has studied changes in patterns of gene expression in evolving populations but has not usually identified the genetic changes causing these shifts in expression. Here we attempt to go beyond these approaches by using machine learning tools to explore interactions between the genome, transcriptome, and life-history phenotypes in two groups of 10 experimentally evolved Drosophila populations subjected to selection for opposing life history patterns. Our findings indicate that genomic and transcriptomic data have comparable power for predicting phenotypic characters. Looking at the relationships between the genome and the transcriptome, we find that the expression of individual transcripts is influenced by many sites across the genome that are differentiated between the two types of populations. We find that single-nucleotide polymorphisms (SNPs), transposable elements, and indels are powerful predictors of gene expression. Collectively, our results suggest that the genomic architecture of adaptation is highly polygenic with extensive pleiotropy.


Assuntos
Drosophila , Genômica , Animais , Drosophila/genética , Perfilação da Expressão Gênica , Heterozigoto , Mutação INDEL
14.
Sci Rep ; 14(1): 2232, 2024 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-38278837

RESUMO

The paper focuses on the correction of Illumina WGS sequencing reads. We provide an extensive evaluation of the existing correctors. To this end, we measure an impact of the correction on variant calling (VC) as well as de novo assembly. It shows, that in selected cases read correction improves the VC results quality. We also examine the algorithms behaviour in a processing of Illumina NovaSeq reads, with different reads quality characteristics than in older sequencers. We show that most of the algorithms are ready to cope with such reads. Finally, we introduce a new version of RECKONER, our read corrector, by optimizing it and equipping with a new correction strategy. Currently, RECKONER allows to correct high-coverage human reads in less than 2.5 h, is able to cope with two types of reads errors: indels and substitutions, and utilizes a new, based on a two lengths of oligomers, correction verification technique.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Idoso , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação INDEL
15.
Nat Commun ; 15(1): 837, 2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38281971

RESUMO

The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compare the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis reveals substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also consider the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produce the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results lead to widespread improvements across AoU.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Saúde da População , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma Humano , Mutação INDEL
16.
Bioinformatics ; 40(2)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38261650

RESUMO

MOTIVATION: Many genetics studies report results tied to genomic coordinates of a legacy genome assembly. However, as assemblies are updated and improved, researchers are faced with either realigning raw sequence data using the updated coordinate system or converting legacy datasets to the updated coordinate system to be able to combine results with newer datasets. Currently available tools to perform the conversion of genetic variants have numerous shortcomings, including poor support for indels and multi-allelic variants, that lead to a higher rate of variants being dropped or incorrectly converted. As a result, many researchers continue to work with and publish using legacy genomic coordinates. RESULTS: Here we present BCFtools/liftover, a tool to convert genomic coordinates across genome assemblies for variants encoded in the variant call format with improved support for indels represented by different reference alleles across genome assemblies and full support for multi-allelic variants. It further supports variant annotation fields updates whenever the reference allele changes across genome assemblies. The tool has the lowest rate of variants being dropped with an order of magnitude less indels dropped or incorrectly converted and is an order of magnitude faster than other tools typically used for the same task. It is particularly suited for converting variant callsets from large cohorts to novel telomere-to-telomere assemblies as well as summary statistics from genome-wide association studies tied to legacy genome assemblies. AVAILABILITY AND IMPLEMENTATION: The tool is written in C and freely available under the MIT open source license as a BCFtools plugin available at http://github.com/freeseek/score.


Assuntos
Estudo de Associação Genômica Ampla , Software , Genômica/métodos , Alelos , Mutação INDEL
17.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38269647

RESUMO

MOTIVATION: Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS: We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.


Assuntos
Algoritmos , Software , Teorema de Bayes , Alinhamento de Sequência , Mutação INDEL , Evolução Molecular
18.
Appl Biochem Biotechnol ; 196(1): 99-112, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37099126

RESUMO

Impaired DNA damage repair cascade can disrupt the lens transparency due to aging-associated oxidative stress. The aim of study was to assess the association of 30 bp indel mutation (rs28360071) in XRCC4 gene with susceptibility of cataract in senility. The study followed case-control design with a total of n = 200 participants and divided equally into senile cataract patients and control groups. Conventional polymerase chain reaction (PCR) was performed for the genotyping of XRCC4 (rs28360071) mutation. In statistical measures, SPSS ® 20.0 software, MedCal©, and SNPStats© tools were used for data analysis. Distribution of homozygous D/D and mutant D allele was higher in senile cataract patients in comparison to controls. XRCC4 (rs28360071) mutation was significantly associated with predisposition senile cataract (χ2 = 13.96, adjusted OR = 2.29, 95% CI: 1.5-3.4, p < 0.001). Codominant model was suggested to be a best fit model. Mutant D/D genotype described significant association with LDL (adjusted OR = 1.67, 95% CI: 0.14-1.45, p = 0.03),and HDL (adjusted OR = 1.66, 95% CI: 0.92-2.31, p = 0.05) cholesterol with higher risk of senile cataract. XRCC4 (rs28360071) mutation may serve as a potential biomarker for the prognosis of cataract in senility. It can used to measure interruption in NHEJ repair pathway to indicate DNA damage in lens epithelial cells which could accelerate cataractogenesis with aging.


Assuntos
Catarata , Polimorfismo de Nucleotídeo Único , Humanos , Íntrons , Polimorfismo de Nucleotídeo Único/genética , Predisposição Genética para Doença , Mutação INDEL , Genótipo , Reparo do DNA/genética , Enzimas Reparadoras do DNA/genética , Envelhecimento , Catarata/genética , Estudos de Casos e Controles , Proteínas de Ligação a DNA/genética
19.
Nucleic Acids Res ; 52(D1): D1276-D1288, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37870454

RESUMO

Among the diverse sources of neoantigens (i.e. single-nucleotide variants (SNVs), insertions or deletions (Indels) and fusion genes), fusion gene-derived neoantigens are generally more immunogenic, have multiple targets per mutation and are more widely distributed across various cancer types. Therefore, fusion gene-derived neoantigens are a potential source of highly immunogenic neoantigens and hold great promise for cancer immunotherapy. However, the lack of fusion protein sequence resources and knowledge prevents this application. We introduce 'FusionNeoAntigen', a dedicated resource for fusion-specific neoantigens, accessible at https://compbio.uth.edu/FusionNeoAntigen. In this resource, we provide fusion gene breakpoint crossing neoantigens focused on ∼43K fusion proteins of ∼16K in-frame fusion genes from FusionGDB2.0. FusionNeoAntigen provides fusion gene information, corresponding fusion protein sequences, fusion breakpoint peptide sequences, fusion gene-derived neoantigen prediction, virtual screening between fusion breakpoint peptides having potential fusion neoantigens and human leucocyte antigens (HLAs), fusion breakpoint RNA/protein sequences for developing vaccines, information on samples with fusion-specific neoantigen, potential CAR-T targetable cell-surface fusion proteins and literature curation. FusionNeoAntigen will help to develop fusion gene-based immunotherapies. We will report all potential fusion-specific neoantigens from all possible open reading frames of ∼120K human fusion genes in future versions.


Assuntos
Antígenos de Neoplasias , Bases de Dados Genéticas , Neoplasias , Proteínas de Fusão Oncogênica , Humanos , Antígenos de Neoplasias/genética , Antígenos HLA , Mutação INDEL , Mutação , Neoplasias/genética , Proteínas de Fusão Oncogênica/genética
20.
Exp Eye Res ; 238: 109742, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38040051

RESUMO

Keratoconus (KC) is characterized by the predominant primary ectatic disease, affecting the cornea, necessitating corneal transplants in some cases. While some loci associated with KC risk have been identified, the understanding of the disease remains limited. Superoxide dismutase (SOD) enzymes play a crucial role in countering the reactive oxygen species and providing protection against oxidative stress (OS). Accordingly, the objective of this study was to investigate a potential association of a 50 nucleotide base pairs (bp) insertion/deletion (I/D) within the SOD1 promoter, and the located 1684 bp upstream of the SOD1 ATG, with KC in the Iranian population. Additionally, an assessment was conducted on SOD activity and the total antioxidant capacity (TAC), as determined by the ferric reducing-antioxidant power assay, along with malondialdehyde (MDA) levels. In this case-control study, genomic DNA was extracted from the blood cells of KC (n = 402) and healthy (n = 331) individuals. The genotype of this gene was determined using the PCR technique. Furthermore, the amount of SOD enzyme activity and the MDA and TAC levels were measured in the serum of the study groups. The (I/I) genotype was present in 84.23%, the (I/D) genotype in 15.06%, and the (D/D) genotype in 0.69% of both groups. A statistically significant relationship was seen between different genotypes and TAC, MDA, and SOD1 activity indices (P < 0.05). Individuals with the D/D genotype exhibited a decrease in total antioxidant capacity, an increase in the amount of MDA, and a decrease in SOD1 enzyme activity (P < 0.05). Moreover, the logistic regression analysis of KC development indicated that elevated levels of MDA increased the risk of KC incidence in the patient group compared to the healthy group, while a higher activity of SOD1 and greater values of TAC decreased the KC risk. The removal of the 50 bp fragment reduced SOD1 activity and elevated OS levels, thereby impacting the oxidant-antioxidant balance. This could potentially play a significant role in individuals afflicted by KC.


Assuntos
Ceratocone , Estresse Oxidativo , Superóxido Dismutase-1 , Ceratocone/epidemiologia , Ceratocone/genética , Ceratocone/terapia , Estudos de Casos e Controles , Adolescente , Adulto Jovem , Adulto , Pessoa de Meia-Idade , Humanos , Masculino , Feminino , Superóxido Dismutase-1/genética , Modelos Logísticos , Curva ROC , Mutação INDEL
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...